Fast multivariate empirical cumulative distribution function with connection to kernel density estimation
نویسندگان
چکیده
This paper revisits the problem of computing empirical cumulative distribution functions (ECDF) efficiently on large, multivariate datasets. Computing an ECDF at one evaluation point requires $\mathcal{O}(N)$ operations a dataset composed $N$ data points. Therefore, direct ECDFs points quadratic $\mathcal{O}(N^2)$ operations, which is prohibitive for large-scale problems. Two fast and exact methods are proposed compared. The first based summation in lexicographical order, with $\mathcal{O}(N{\log}N)$ complexity to lie regular grid. second divide-and-conquer principle, $\mathcal{O}(N\log(N)^{(d-1){\vee}1})$ coincide input two algorithms described detailed general $d$-dimensional case, numerical experiments validate their speed accuracy. Secondly, establishes connection between kernel density estimation (KDE) large class kernels. paves way regression. Numerical tests Laplacian accuracy algorithms. A broad range estimation, survival function regression problems can benefit from methods.
منابع مشابه
Kernel estimation of multivariate cumulative distribution function
A smooth kernel estimator is proposed for multivariate cumulative distribution functions (cdf), extending the work of Yamato [H. Yamato, Uniform convergence of an estimator of a distribution function, Bull. Math. Statist. 15 (1973), pp. 69–78.] on univariate distribution function estimation. Under assumptions of strict stationarity and geometrically strong mixing, we establish that the proposed...
متن کاملEfficient Estimation of the Density and Cumulative Distribution Function of the Generalized Rayleigh Distribution
The uniformly minimum variance unbiased (UMVU), maximum likelihood, percentile (PC), least squares (LS) and weighted least squares (WLS) estimators of the probability density function (pdf) and cumulative distribution function are derived for the generalized Rayleigh distribution. This model can be used quite effectively in modelling strength data and also modeling general lifetime data. It has...
متن کاملMemory-Effcient Orthogonal Least Squares Kernel Density Estimation using Enhanced Empirical Cumulative Distribution Functions
A novel training algorithm for sparse kernel density estimates by regression of the empirical cumulative density function (ECDF) is presented. It is shown how an overdetermined linear least-squares problem may be solved by a greedy forward selection procedure using updates of the orthogonal decomposition in an order-recursive manner. We also present a method for improving the accuracy of the es...
متن کاملFast and Extensible Online Multivariate Kernel Density Estimation
In this paper we present xokde++, a state-of-the-art online kernel density estimation approach that maintains Gaussian mixture models input data streams. The approach follows state-of-the-art work on online density estimation, but was redesigned with computational efficiency, numerical robustness, and extensibility in mind. Our approach produces comparable or better results than the current sta...
متن کاملEmpirical Testing of Fast Kernel Density Estimation Algorithms
We present results of experiments testing the Fast Gauss Transform, Improved Fast Gauss Transform, and Dual-Tree methods (using kd-tree and Anchors Hierarchy data structures) for fast Kernel Density Estimation (KDE). We examine the performance of these methods with respect to data set size, dimension, allowable error, and data set structure (“clumpiness”), measured in terms of CPU time and memo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Statistics & Data Analysis
سال: 2021
ISSN: ['0167-9473', '1872-7352']
DOI: https://doi.org/10.1016/j.csda.2021.107267